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and the need for improvement in the character of educational research 
is voiced in various articles by Alexander (2), Chapman (38), 
McCall (110), Monroe (117), Symonds (170), Whipple (194), and 
Woody (208). 

2. General Psychology of Learning. There are upwards of a 
score of experimental studies ranging from maze learning to the 
evolution of concepts. Of interest in the field of transference of 
training are the second study of mental discipline in high school 
studies by Broyler, Thorndike, and Woodyard (30), and the inves- 
tigation of Bowers (24). The review on work and fatigue by 
Spencer (163) is useful to students in this field. 

A special interest in supervised study and directing study is 
reflected in the studies by Shreve (155), Douglas (55), Flemming and 
Woodring (64), Monroe (116), and others. Interested teachers 
should consult a valuable bibliography by Woodring and Flem- 
ming (207). 

A special interest also in the problem of homogeneous grouping 
is shown by the volume by Ryan and Crecelius (148), and the 
studies by Billett (15), Miller (115), Rainey and Anderson (137), 
Shields (153), Ulrich (183), Viele (184), Wilson (199, 200), and 
Worlton (209). The college student is the subject of many inves- 
tigations, notably by Book (19, 20, 21). 

3. Psychology of School Subjects. Interest in the psychology 
and pedagogy of reading continues unabated. Four books by Black- 
hurst (17), Gates (70), Good (74), and Wiley (197), and many spe- 
cial investigations have appeared during the year. Investigations 
during 1926 and 1927 to June 30 are summarized by Gray (75). 

The vocabulary studies have been enriched by the publication of 
the results of Horn’s (85) well known investigations and by the 
comparative study by Dolch (52). 

Arithmetic studies are represented by Buckingham (32, 33), 
Fowlkes (66), Heidbreder (83), O’Brien (124), Osburn (127), 
Otto (128), and Washburne (187). 

Special interest has been shown in English and in the social studies 
in various experiments. Huber (86) with others on children’s inter- 
est in poetry and the studies of Feasey (61) and Nesmith (122) 
indicate an interest in this field. 

4. Pre-School Education. In addition to the book by Forest (65) 
and the record of researches in child development by Marsten (112), 
there are numerous experimental studies in the psychology of infancy. 
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5. Miscellaneous. In spite of the development of standard tests 
and new type examinations, the interest in the improvement in marks 
continues. The studies of Banker (10), Bolton (18), Cocking and 
Holy (40), Darsie (46), Lauterbach (101), and Spence (161) are 
representative. A special interest in the measurement of teaching 
ability is reflected in the studies by Ballou (9), Bathurst (13), 
Betts (14), Tonks (179), and Symonds (172). Sex differences are 
treated systematically by Lincoln (106) and are studied in several 
other investigations. Visual education is represented by Mead (113), 
Ross (143), and Wilbur (196). 
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INTELLIGENCE TESTS 


BY RUDOLF PINTNER 
Teachers College, Columbia University 


General Books. Several books, dealing wholly or in part with 
intelligence tests, have recently appeared. Spearman (124) gives an 
extended treatment of his well-known theory. He gives a detailed 
criticism of existing theories, an account of the development of the 
two-factor theory, and assembles all the evidence for this theory. 
The tetrad difference method is fully explained in this book. Four 
general factors are discovered—general intelligence, inertia or lag, 
recuperation, and a conative factor called self-control. Kelley (68) 
would abolish the difference now made between intelligence and 
educational tests. He discusses the significance of the probable error 
and differentiates between the value of a test for group and for 
individual purposes. He discusses individual idiosyncrasy and 
emphasizes its value. About half of the book is devoted to the results 
of the ranking of all sorts of tests by seven judges, and to giving 
detailed information about each test. Two books, dealing with 
measurement in the high school field, have appeared, namely, by 
Symonds (134) and by Ruch and Stoddard (112). Both give con- 
siderable attention to the use of intelligence tests in secondary educa- 
tion. For the clinical worker there are also two new books, one by 
Wells (146) and the other by Wallin (142). Wells goes into great 
detail with reference to individual testing, and he gives numerous 
case studies. Wallin’s book covers a much wider field than merely 
the measurement of intelligence. Incidentally a good deal of infor- 
mation about the development of intelligence is included by Holling- 
worth (60) in his broader study of mental growth. 

The Meaning of Intelligence. Piéron (99) discusses the different 
concepts of intelligence and stresses the fact that intelligence is a 
value idea. Slocombe (120,122) applies the tetrad-difference cri- 
terion and supports the Spearman theory as of value in determining 
the selection of tests for measuring intelligence. Thomson (137) 
claims that the tetrad-difference criterion does not prove the two- 
factor theory, because a theory of many group factors would satisfy 
the tetrad-difference criterion equally well. Strasheim (128) builds 
several tests based on the Spearman theory that the eduction of 
relations is basic in intelligence. Davey (30) finds that pictorial tests 
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can measure intelligence as well as verbal tests. De Weerdt (33) 
finds low correlations between general improvability on many tests 
and scores on an intelligence test. Pintner and Upshall (104), using 
a measure of social intelligence, find a high correlation with abstract 
verbal intelligence. McFarlane (87) believes that there is such a 
thing as practical ability, analyzing and judging about concrete spatial 
relations. The tests given are those of the performance and mechan- 
ical variety. The growth of intelligence of young children is studied 
by Cunningham (26), and he finds it practically a straight line from 
age 24% to 6. He finds a high correlation between Binet mental ages 
and CAVD scores at these low levels. 

The Constancy of the 1.Q. Hildreth (58) gives results for 441 
cases tested from two to eight times. For 441 retests the correlation 
is .86; for 1,112 pairs of tests the correlation is 81. For 596 pairs 
of tests by the same examiner the correlation is .87; for 488 pairs 
by different examiners the correlation is .79. Gray and Marsden (52) 
give the final results of their Binet retesting. For first retests the 
correlation is .88; for all comparisons it is .85. Randall (110) gives 
results for 152 cases with retest intervals up to five years. The cor- 
relation is .79 and the length of the time interval has no effect. By 
giving the Binet test twice on the same day to thirty cases, Lin- 
coln (80) finds a correlation of .95. The median I.Q. change is 3.4. 
For 144 Binet retests, Cushman (28) finds a correlation of .74. A 
correlation of .93 between retests on the Terman group test is reported 
by Broom (16). Cowdery (22) shows how the correlations between 
repeated Thorndike intelligence examinations decrease over a period 
from one to three years. Slocombe (121) argues that the I.Q. can- 
not be constant, and Cornell (21) stresses the fact that individuals 
may vary very greatly in I.Q. from test to test and hence the I[.Q. is 
of little help in individual clinical diagnosis. Pyle (109) finds much 
overlapping between children of high,. medium, and low I.Q.’s in 
different learning tests. Cureton (27) works out a method of making 
corrections for M.A. and C.A. in order to get 1.Q.’s having the same 
significance at all ages. 

Factors Influencing Intelligence Ratings. Merriman (90) shows 
that six hours’ coaching on the Thorndike College entrance tests leads 
to an appreciable increase in score on another form of the test, 
though all of the improvement is limited to Part I of the test. 
Slocombe (119) calculates the index of intellective saturation for 
many retestings by means of several types of test. Since this index 
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is high at the second testing, we should use 25 per cent of testing 
time for fore-exercises. The Twenty-Seventh Yearbook of the 
National Society (93) is devoted to this problem of the influence of 
various environmental factors on test scores. In this book Freeman 
finds an increase from 7 to 10 points in I.Q. of foster children 
changed to a good environment, while Burks in a similar study finds 
an increase from 3 to 9 points. Willoughby and Jones contribute 
articles giving parent-child correlations. Denworth and Heilman 
show the negligible effect of length of school attendance. May and 
Hartshorne find correlations around .47 for honesty tests between 
siblings. There are many other articles centering round the general 
problem. Teagarden (136) finds no increase in 1.Q. in two children 
transferred from a very bad to a very good environment over a period 
of five years. 

Symonds (135) lists 25 factors which influence test reliability. 
Lanier (75) discusses the Spearman prophecy formula for length of 
a test and applies it to actual tests. For the Otis test it works very 
well, but for musical tests very badly. Skaggs (117) criticises the 
concepts of validity and reliability as frequently employed in test 
construction. Popenoe (106) criticizes the A.Q. and finds it very 
unreliable. The correlation between two sets of A.Q.’s derived from 
repetitions of intelligence and achievement tests is only .28. Abel- 
son (1) does not find that the objective item analysis method of 
scoring for certain tests given to coliege freshmen is superior to the 
usual scoring method, at least with the use of college marks as a 
criterion. Woodyard’s (150) thorough study of individual variability 
leads to the conclusion that an individual is as likely to be different 
after a few minutes as after days or weeks, up to an interval of a 
year at least. Correlations between tests are not lowered perceptibly 
by longer intervals up to a year. Walters (144) gave various tests 
with half time, standard time, and extended time, and found little 
change in the correlations with Stanford M.A. as a criterion. Farns- 
worth (35) shows that speed of reaction for simple reactions is not 
highly related to speed for choice reaction, and has no correlation at 
all with the usual intelligence test scores. Choice reaction is posi- 
tively correlated, and the easier the test the higher it correlates. 

Kuhlmann (72) has elaborated the median mental age method for 
calculating mental ages. Lincoln (8f) proposes to standardize tests 
by means of a mental age distribution or by means of a chronological 
age group whose I.Q.’s fall between 90 and 110. Bridges (13) dis- 
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cusses various difficulties encountered in giving tests to pre-school 
children, and Seago (114) makes an analysis of language factors 
entering into tests of various types given to university students. 
Gopfert (48) studies the Binet tests singly and shows the wide 
overlapping from age to age and grade to grade. 

Scales and Individual Tests. No new scale for the individual 
measurement of intelligence seems to have appeared, unless the term 
scale be applied to Goodenough’s (46) attempt to measure intelligence 
by the evaluation of a child’s drawing of a man. She reports high 
reliability and validity. In another direction the suggestion of a 
possible scale to measure intelligence is made by Snedden (123), who 
used the interview method to obtain a measure of intelligence 
unknown to the examinee. The possibility for the construction of 
such disguised intelligence tests seems to be very good. A manual 
of separate individual tests with directions and norms has been pre- 
pared by Bronner et al. (15). Bayley (6) describes performance 
tests suitable for three- to five-year-old children, and Blacking (7) 
presents the standardization of a bead-stringing test. Boge (9) 
describes three performance tests for “practical” intelligence, and 
Lichtenstein (79) discusses Gregor’s vocabulary test in great detail 
and makes a short vocabulary test with a mental age standardization. 

Group Tests. Very few new group intelligence tests have come 
to the notice of the reviewer. Kuhlmann and Anderson (73) have 
prepared a very elaborate and thorough series of group tests for ages 
six to maturity. The norms and standardization seem very complete. 
Pintner (102) presents a rapid survey test which he claims has the 
most objective and most rapid method of scoring of any test so far 
published. It is designed for grades 4 to 8. Reliability and validity 
coefficients are given. In the German literature, Lammermann (74) 
presents a standardization of a group test made up of opposites, arith- 
metical problems, and so forth; while Schafer (113) describes group 
completion and picture series tests. A new test of mechanical ability 
is described by MacQuarrie (83). Two studies about well estab- 
lished group tests have been made by Jones (66) and by Poull (107). 
The former deals with the validity of the Myers mental measure and 
the latter with the clinical value of the Rhode Island test. 

The Feebleminded. Bonnis (10) discusses in general the develop- 
ment of intelligence among the feebleminded. He presents results of 
repeated tests for over 200 cases. He plots these all on one chart, 
from which he derives hypothetical curves for the growth of different 
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levels of intelligence. He finds that the I.Q. tends to decrease with 
repeated tests. Minogue (91) also finds that a larger percentage of 
feebleminded show a loss in I.Q. rather than a gain, in a study of 
441 retests of feebleminded children, although the largest percentage, 
72 per cent, show a constant L.Q., i.e., not more than five points of 
change. Wallin (143) brings together a great mass of data on the 
problem of scattering on the Binet scale. He finds that normals 
scatter more than feebleminded, while the unstable group of psycho- 
paths, and the like, scatter a little more than the normals, but not 
enough to make this a diagnostic sign. Fox (38) compares normal 
and feebleminded children of the same mental age, and finds the 
tests on which the feebleminded are better and those on which the 
normals are better. Similarly Wilson (148) contrasts the learning 
ability of bright and dull children in a detailed study. In some tasks 
the learning curves for the bright and dull are identical; in others 
very different. He concludes that the more “mental” the task, the 
greater is the likelihood of differentiation between the bright and 
dull curves. 

The Superior. Apart from the studies comparing bright and dull 
children noted in the previous paragraph, there are only two others 
dealing solely with bright children. Hollingworth (61) continues the 
report of a very bright individual who ten years ago tested at 187 
1.0. Now his CAVD score puts him about + 4 P.E. above college 
graduates. His scholastic record fulfills in every way the prognosis 
made ten years ago. Witty and Lehman (149) discuss in a theoret- 
ical article what they call “drive,” especially that kind of a drive 
directed to overcome some weakness. If high I.Q.’s are typically 
well adjusted, they may not have any reason for drive, and hence 
may not be the outstanding geniuses of to-morrow. 

The School Child. The general differences between bright and 
dull children as found in segregated classes are discussed by 
Baker (5), based upon his experience with X, Y, Z classes in Detroit. 
He takes up each school subject and discusses the different treatment 
necessary for the bright and the dull. Maher (86) reports the results 
of homogeneous grouping in the primary grades and considers 90 
per cent successful in such classes. Levy and Bartelme (78) tested 
thirty children on the Binet and found that their M.A.’s agree well 
with achievement on Montessori materials in a Montessori school. 
Comparing pupils of like M.A. and C.A., but of different grades, all 
having 1.Q.’s below 90, Orleans (95) finds that the pupils in the 
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higher grades achieve more on objective achievement tests. In a 
study of 100 nonpromoted children Stalnaker and Roller (126) found 
91 per cent with I.Q.’s below 90 and 31 per cent with I.Q.’s below 
70. Adler (2) gives mental test results fer over 5,000 cases as part 
of a larger mental health survey. Distributions of I.Q.’s by schools, 
by sex, and by grade are given. About 5 per cent were deemed 
suitable for special classes. 

Duthil (34) reports a French translation of one of the Otis tests 
and the results for 221 cases. The average score of the French 
thirteen-year-olds corresponds to the 12-6 U.S. A. norm. In Ger- 
many, Weigl (145) gives a description of three group tests, #.¢., classi- 
fication, analogies, number series, given to nine- and ten-year-old 
children. Rogers (111) makes the second report of tests given to 
private school pupils. Various group tests are reported for over 
3,000 cases. The median I.Q. is about 115. 

In the junior high school Maddocks (84) reports the results of 
100 cases of those who failed in any subject, and finds that 56 per 
cent fall below an I.Q. of 90. Shewrman (116) gives the correla- 
tions between the Terman group test and school marks in high school 
after four years for the graduating class. The coefficients lie between 
43 and 70. A retest by the Terman group test after three and one- 
half years gives a correlation of .77. Hildreth (59) gives the results 
of the Thorndike College entrance test for a senior high school and 
finds only 6 per cent probably poor college material. A comparison 
of the Binet I.Q.’s and the Thorndike scores is also given. Hurd (62) 
reports a correlation of .76 between the score on a physics test and 
the average I.Q. on two intelligence tests for 58 pupils in grade XI. 
Strickland (129) gives results for over 1,000 senior high school 
pupils on an intelligence test made up of the usual type of material. 
Sex differences show 67 per cent of boys reaching or excelling the 
girls’ median. In college he reports the Thorndike scores for fresh- 
men. Sudweeks (131) reports the results for nearly 2,000 continua- 
tion school children tested by the Terman group test. The average 
1.Q. is 85.5. 

College Students. Brigham (14), in the second annual report on 
scholastic aptitude tests for the College Entrance Board, gives a 
detailed analysis of each of the nine subtests. The tetrad difference 
equation is used to discover specific factors common to any two tests. 
The reliability of the whole test is reported to be about .95. Stal- 
naker (127) studies the differentiating power of the seven subtests 
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of the American Council psychological examination, and Crane (23) 
gives the results for the Thurstone and Thorndike tests at Bryn Mawr. 

Many authors report correlations between intelligence tests and 
academic grades. Cleeton (20) finds correlations of 50 for the 
Thorndike College entrance and for the Iowa content examination. 
Guiler (53) finds correlations of .40 to .52 with the Ohio College 
test, Otis group, and Terman group. He considers the Otis best. 
Grauer and Root (51) report a correlation of 39 for the Thorndike 
test. They give case studies and conclude that students should not 
be excluded on the basis of the Thorndike score alone. Nelson and 
Denny (94) find correlations of .77 and .64 between the Terman 
group and grades in psychology. They also give results for 1,250 
freshmen tested on the Terman and Thurstone tests. Carter (19) 
compares the correlations between an English test and semester 
grades of 38 with that between an intelligence test and the same 
grades of 45. Owens (96) compares the Army Alpha and the 
American Council tests given to college students. 

With normal school students, we have the study of Keator and 
Bechtel (67), who give the results of the Thorndike intelligence test 
for two successive years in four Connecticut normal schools. They 
arrive at a tentative critical score below which a student should not 
fall. Waddell (140) compares the Army Alpha scores of the students 
in a teachers college with those in other colleges. He gives com- 
parisons of scores and college grades and concludes that a low stand- 
ing in Alpha shows unfitness in college work and in practice 
teaching. ° 

Jones (65) compares the Alpha scores of Columbia College and 
Columbia Extension students and finds that the former make much 
higher scores. Kornhauser (70) studies the students in the School 
of Commerce and finds intelligence test scores better than high school 
marks for predicting college grades. Crawford (24) believes that 
the giving of scholarships acts as a motivation for academic work and 
that this raises the correlation between grades and intelligence scores. 
Jones (64) reports results of intelligence tests of inferior freshmen 
and the effect of special training. Brotemarkle (17) describes an 
elaborate scheme for the individual testing of college students in 
connection with the general problem of personnel work. Spence (125) 
describes the numerous factors in addition to intelligence which are 
related to college achievement. He finds a negative correlation 
between intelligence and time spent in study. He concludes that the 
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most important factor for success in academic work is general 
intelligence. 

The Delinquent. Healy and Bronner (55), in their work on 
juvenile delinquents, find 13.5 per cent clearly feebleminded. Their 
distribution of the I.Q.’s of 4,000 repeated delinquents shows a mode 
around 90 1.0. A larger percentage of the feebleminded fail to 
respond to probation treatment as compared with the normal. Slaw- 
son’s (118) book gives a very detailed analysis of many tests given 
to about 500 delinquent boys. He discusses the difficulty of deter- 
mining the I.Q.’s and shows the great difference in I.Q. distribution 
resulting from the use of 16 and 14 as a divisor. He finds the boys 
do better on the Thorndike nonlanguage test than on the N.I.T., and 
that they are up to the norms for city school children on the Stenquist. 
Murchison (92) gives a detailed account of the results of testing 
nearly 4,000 white penitentiary prisoners with the Army Alpha. He 
finds that they are somewhat better than the white draft, and that 
recidivists are better than first offenders. Kuhlmann (71) gives a 
percentage distribution of the I.Q.’s of delinquents in five institutions 
and finds from 24 to 42 per cent below an I.Q. of 75. Bridges (12) 
studies 33 delinquent girls and finds an average 1.0. of 88 on the 
N.L.T. and 82 on the Myers mental measure. He also gives results 
for the Mathews questionnaire and the Woodworth emotional test. 
Sullivan (132) gives the I.Q. distribution of boys entering Whittier 
State School, finding a mean I.Q. of 90, where a policy of sending 
the definitely feebleminded to other institutions exists. Boynton (11) 
reports the results for twenty-one twelve-year-old boys in a reform 
school, finding all below an 1.Q. of 87. Asher (4) finds a median 
I.Q. of 67 on the Binet for twenty boys in a reform school. On the 
Stenquist assembly tests they do about average and there is little 
overlap between the Binet and Stenquist ratings. Stryker (130) 
presents a case study of a delinquent boy with an I.Q. of 93, and 
finds that undergrading was the cause of his bad conduct. 

The Deaf and Blind. Pintner (100,103) presents the results of 
over 4,000 deaf children tested on the Pintner nonlanguage and 
educational survey tests. He compares the day and residential 
schools, the effect of the age of becoming deaf, and the methods of 
instruction. A comparison of the deaf and hearing shows the very 
marked retardation of the deaf in both intelligence and achievement. 
Hayes (54) gives a brief summary of the intelligence and achieve- 
ment testing done in schools for the blind. 
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Racial Comparisons. Graham (50) reports group or individual 
tests given to over 3,000 negro children in Atlanta, Georgia. The 
difference between the means of the white and colored increases 
markedly with age, at age seven the two groups being about equal. 
Davis (31) gives results for 222 negro normal school students and 
finds a median I.Q. of 78 on the Terman group test. He believes 
that lack of schooling is largely responsible for this. Herskovits (57) 
presents results for 539 negro college students on the Thorndike 
College entrance test. He finds no significant correlation between 
intelligence score and anthropological measures of amount of white 
mixture. 

Garth and Garrett (43) give the results of the N.I.T. for over 
2,000 Indian children. They find an increase in I.Q. with increase 
in white mixture. The I.Q.’s of the Indians range from 70 to 91 
as compared with a white 1.0. of 100. Garth (42) finds a correlation 
of 42 between degree of white blood and intelligence in a study of 
Indians. Fitzgerald and Ludeman (37) find a slight correlation 
among Indians between percentage of white blood and intelligence. 
The median I.Q. on intelligence tests is about 6.88. 

Paschal and Sullivan (97) give the results for 204 nine-year-old 
and 211 eleven-year-old Mexican children on six performance tests. 
They fall below the American norm on all tests. There is a positive 
correlation between amount of white biood and intelligence score. 
Garretson (41) finds Mexican children lower than American children 
on both the N.I.T. and the Myers mental measure, the difference 
being greater on the former test. 

Graham (49) finds Chinese children in San Francisco superior 
to American on the Kohs test, but inferior on the mentimeter and 
N.LT. On the Stanford their average I1.Q. is 87. ‘Darsie (29) 
reports results for 658 American born Japanese children. Their 
median 1.Q. is about 90 on the Binet, but they are equal to the 
American norms on the Army Beta. Mead (88) tests Italian chil- 
dren in America and finds a mean I.Q. of 95 on the Binet for 43 
cases. On the Otis group test only 7 per cent out of 276 cases are 
above the median American 1.0. The mean score increases with 
increase in amount of English spoken in the home, with length of 
stay of fathers in this country, and with progress up the grades. 
Pintner (101) compares 271 Belgian children tested in Belgium with 
the American norms on the Pintner nonlanguage test and finds no 
difference in the mean scores for ages nine to fourteen. 
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The Employee. In his book on employment psychology Burt (18) 
discusses intelligence tests and their uses in various fields of employ- 
ment. Two studies have appeared dealing with the intelligence of 
policemen. Merrill (89) reports an average Army Alpha score of 
104 for 113 applicants to the police force. Fernald and Sullivan (36) 
find a mean score of 82 on the Army Alpha for 1,712 men on a city 
police force. They give a distribution according to the army ratings. 
Pyle (108) gives intelligence tests to teachers and finds that “ intel- 
ligence is a just-barely-perceptible factor in school success.” 
Pond (105) combines seven tests from Army Alpha and Beta into 
a new test and gives this to all newly hired workers in a metal 
industry for a year. Critical score ranges for different jobs are then 
determined. Freyd (39) gives a description of available tests with 
a summary of results for the selection of typists and stenographers. 

Inheritance. Goodenough (47) finds a correlation of about 3 
between the intelligence of 380 pre-school children and the educa- 
tionai status of their parents. Gesell and Lord (45) compare nursery 
school children of equal age of low and high economic status and find 
the psychographs of the latter in general superior to those of the 
former group. Aldrich (3) compares the I.Q.’s of 1,100 high school 
pupils according to the fathers’ occupations. The labor groups are 
slightly lower than the nonlabor groups. Jones and Carr-Saun- 
ders (63) find the same relationship of I.Q.’s to occupation among 
orphan children brought up in the same environment as among 
children not in orphanages. Blanchard and Paynter (8) give the 
1.Q.’s for 80 children from “ marginal” families, and find over 50 
per cent below 1.0. 90. Sutherland and Thomson (133) in England 
and Lentz (77) in this country discuss the correlation between I.0. 
and size of family. The former find negative correlations of about 
.2 and the latter of .3. Lentz shows a steady decrease in 1.0. from 
108 for only children to 80 for families of 12 or more. Wahl- 
quist (141) compares urban and rural children and finds the usual 
superiority of the urban group. 

Miscellaneous. Thorndike (138, 139) finds that boys excel girls 
by about 5 points on the I.E.R. tests at ages thirteen to sixteen and 
by about 17 points at ages seventeen to eighteen. Differential selec- 
tion is probably operating. Boys are slightly more variable than 
girls, and there are more very high scores among the boys. Whip- 
ple (147) finds high school boys superior to girls on the Alpha. He 
finds the subtests on which they show their superiority. Good- 
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enough (48) makes a summary of the sex differences so far reported 
and gives her results for 300 pre-school children. She finds the girls 
superior on verbal tests and the boys on formboard tests. Per- 
kins (98) finds a positive correlation between M.A. and number of 
teeth, after eliminating C.A., in a study of 555 children. Shel- 
don (115) duplicates Naccarati’s investigation with 450 students and 
finds a correlation of .14 between intelligence and the morphologic 
index. 

Gaskill et al. (44) study the ability to estimate intelligence from 
photos and find a median correlation of +.42 between the estimates 
of 274 judges and the photos of 12 eleven-year-old boys with I.Q.’s 
ranging from 18 to 171. Magson (85) finds that an estimate of 
intelligence based upon a five-minute interview correlates only +.15 
with objective tests of intelligence. The correlation between a 
mature estimate and intelligence tests is .54. Wyatt (151) finds a 
low correlation between monotonous work (soap wrapping) and 
intelligence for 30 factory girls. Henig (56) believes there is a cor- 
relation between intelligence and freedom from accidents among 164 
boys in a vocational school. Koch (69) shows the similarity of the 
test scores for a pair of Siamese twins. In a study of 62 pairs of 
chums, Furfey (40) finds low but positive correlations for C.A., 
M.A., height, weight, and developmental age. Lehman and 
Witty (76) find that dull children take more to social games than do 
bright children. The index of social participation decreases with 
increase of M.A. and C.A. 

Dearborn (32) uses the Binet and other intelligence tests to deter- 
mine intellectual regression and progression with adult subjects. 
Lowe (82) gives a percentage distribution of the 1.Q.’s for 344 
unmarried mothers, finding 24 per cent with an I.Q. below 75. Cun- 
ningham (25) reports a scale for measuring the gross motor develop- 
ment of young children and finds it correlates highly with mental age. 
Zyve (152) describes a scientific aptitude test which differentiates 
between science and nonscience students. 
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EDUCATIONAL TESTS 
BY VERNON JONES! 
Clark University 


General. Four textbooks concerned with the general problems 
of educational testing have appeared during the year in this country. 
Ruch and Stoddard (58) give a brief sketch of the history and 
present status of measurement in high schools. The uses and limi- 
tations of tests at this level are discussed and the point is made that 
the greatest possible benefits are not being obtained in secondary 
school testing at present, due mainly to the inaccuracies of measuring 
instruments, on the one hand, and to current misunderstanding among 
school officials of the meaning of results, on the other. Symonds (63) 
summarizes the reasons why better measurement is needed in high 
schools. One of his main points is that ordinary school marks at 
present are seriously unreliable, and he recommends the use ot 
standardized tests and new type informal examinations as means for 
the improvement of conditions. The values of tests in predicting 
success in academic work, clerical and mechanical pursuits are exam- 
ined in detail. Kelley (38) deals with the general problem of inter- 
pretation of test results in light of their reliabilities and validities. 
He makes distinctions among the various purposes for which tests 
are used. Minimal reliabilities required of a test for these various 
purposes are discussed. 

In addition to these three books which have appeared in their 
first editions, mention should be made of the appearance of a revision 
of the elementary text on measurement by Lincoln (41). 


Extension of Educational Measurement. The ordinary observer 
is baffled by the rapid increase in the number of proposed measuring 
instruments. No entrance examination must be passed by a new 
examination which applies for admission to the company of standard 
measures. There is no standard test for standard tests. Some of 
the new survey tests are of high quality and are carefully standard- 
ized; some are merely lists of questions about which we are givea 
little or no experimental evidence. Due to limited space, all tests 


1The writer is indebted to Miss Dorothea Johannsen, a graduate student 
of Clark University, for assistance in the preparation of this review. 
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which have been proposed for general use cannot be mentioned; 
however, in noting the increase in the number of tests in old fields 
and the extension of measurement to new fields, the most significant 
new survey tests will be mentioned. Tests which are primarily diag- 
nostic or prognostic in nature are discussed under separate headings 
later. 

Important contributions in the form of series of tests have been 
made by three groups of workers. First, the Columbia Research 
series, under the authorship of Wood and others, contains new tests 
in algebra (51) and American history (10). Second, the workers 
in the Modern Foreign Language Study of the American Council on 
Education have devised seven tests: The Alpha French test, by 
Henmon and others (31); the Alpha German test, by Henmon aad 
others (32); the Alpha Spanish test, by Buchanan and others (8) ; 
the Beta French test, by Greenberg and Wood (26); the Beta Span- 
ish test, by Callcott, Williams, and Wood (9); the French grammar 
test, by Cheydleur (14); and the German reading scales, by Van 
Wagenen and Patterson (73). Each test of these two series is 
accompanied by a manual which contains norms and directions for 
administering and scoring. Third, the Harvard test series (29) con- 
tains twelve tests in Latin, one in French, one in high school chem- 
istry, two in physics, and two in social studies. The test elements 
were selected on the basis of careful analyses of current textbooks, 
questions from college entrance examinations, word counts, and the 
like. Facts on reliability are given for two out of the eighteen tests. 
Norms are reported for about one-half the tests. Thirty copies of 
each test are bound in loose-leaf pads. 

Sangren and Woody (59) have devised a new reading test for 
grades IV to VIII. Parker and Waterbury (52) have also produced 
a reading test. They recommend it for use in grades II to IX. 
Beery (4) has attempted to devise a test for use with very young 
children to determine their readiness to begin reading. Abbott (1) 
has made a set of standard themes for use in analyzing and grading 
general merit in English composition. 

Several tests may be mentioned to illustrate the extension of 
measurement to new or relatively new fields. Toops (72) describes 


a test which purports to measure study habits. In constructing the 
test an effort was made to select sub-tests which yield high correlations 
with scholarship but low correlations with general intelligence. The 
test has been given to the entering classes in many colleges and a 
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follow-up program is planned. O’Brien and Giblette (45) have 
sought to measure achievement in sewing. Richards (56) describes 
a test which he has devised in biology. This test was found to cor- 
relate .71 with the only other test in this field, namely, the Ruch- 
Crossman test. The author reports a reliability of .62, and gives 
tentative norms based on 303 high school and college students. 

Other new tests which have come to the attention of the reviewer 
are as follows: Test in American poetry, by Cavins (12); a reading 
test in Spanish (15) and a Spanish vocabulary test (16) by Contreras, 
Broom, and Kaulfers; a primary reading test, by Williams (79) ; 
a narrative reading test, by Stone and Buehrmann (60); a reading 
test, by Van Wagenen (73); and a test in food preparation, by 
Streeter and Trilling (62). 

Intensive Study of Current Instruments and Methods. Horn (35) 
emphasizes the importance of examining the content that goes into 
tests in light of its social value. Hill (34) stresses the point that 
tests should be constructed only by those who are familiar with the 
objectives, the content, and the methods in a given subject. 

The necessity for higher reliability and validity in tests is strongly 
emphasized by several writers. Kelley (38) concludes that unless 
our measures are greatly improved in reliability and validity, the 
accomplishment quotient technique must be discarded for use in indi- 
vidual diagnosis. He states that current group measures of intel- 
ligence and educational achievement are rather unreliable for indi- 
vidual measurement, and he feels that any ratio based on these two 
is especially unreliable since they measure to such a large extent the 
same thing. He says: “On the average, in the neighborhood of 90 
per cent of the capacity measured by an all-round achievement battery 
score and the capacity measured by a general intelligence test is one 
and the same thing.” For practical school purposes he considers it 
vastly more important that a test should have high reliability and 
validity than that it should have widely established norms. Pope- 
noe (53) finds the reliability of the accomplishment quotient to be 
.28. Moreover, he finds a correlation of —.46 between accomplish- 
ment quotients and intelligence quotients. He concludes that the 
reliability of the A.Q. is too small to justify the use of this measure 
educationally. Ruch and Stoddard (58) emphasizes the necessity 
for improvement in the reliability and validity of tests if they are to 
be relied upon for individual measurement. They point out the fact 
that though a test may be suitable for group diagnosis, it may yield 
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very unreliable results for an individual. Hull (36) studies the effi- 
ciency of tests in forecasting future performance. He concludes that 
unless radical improvement takes place in the accuracy of measure- 
ment, tests are “ doomed to operate at an efficiency of 40 or 50 per 
cent or less.” Muenzinger (44) emphasizes individual idiosyncrasies 
and concludes that tests at their present stage of reliability and 
validity are untrustworthy for use in studying individuals. 

Madsen (43) studied the reliability of the scoring of an objective 
examination. Forty-seven normal school seniors were asked to score 
one Stanford achievement test each. The experimenter reports that 
15 of these 47 untrained scorers made mistakes aggregating 33 in 
number. Fifteen out of the 33 errors were made in connection with 
omitted items. 

Thurstone (70) examined the assumptions underlying the con- 
struction of product scales in handwriting, drawing, and composition 
on the basis of the Cattell-Fullerton theorem. He concludes that 
neither are equally often noticed differences equal nor are equal dif- 
ferences equally often noticed, unless the discriminal dispersions of 
the specimens are equivalent. Finding that none of the authors using 
this scaling technique has proved that the variabilities in judgments 
of all specimens were uniform, he feels that it is unlikely that this 
assumption of equality in dispersion is justified and, therefore, that 
the original scales are most probably faulty. The author (71) 
extends his examination to another method of scale construction, 
namely, the variability of grade method as illustrated in the con- 
struction of the Trabue language scales of 1916. He emphasizes the 
point that the probable error method makes the invalid assumption 
that the variability in each of the different grades is the same. He 
offers a solution to the difficulty noted. Using the method which he 
proposed in 1925, Thurstone shows how the Trabue test, and others 
similarly constructed, can be scaled so as to take into account both 
lateral displacement and differences in dispersion from grade to grade. 

Brigham (7) studied the relative merits of the current methods 
of determining the reliability of tests, and he concludes that the two- 
forms method and the split-half method yield approximately equiva- 
lent results. 


Use of Tests in Evaluating Instruction. Ruch and Stoddard (58) 
state that the use of tests for the supervision of instruction has 
been the most important function of such measures. Wallis (76) 
emphasizes the importance of educational tests in comparing schools. 
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O’ Hearn (46) describes a codperative survey conducted in the schools 
of Rochester, N. Y. He feels that tests are of great value to teachers 
and supervisors in taking inventory. He thinks, however, that unless 
remedial measures follow the evaluating of conditions the testing 
program is a waste of time and money. 

Five out of the six important educational surveys reported in 1927 
made extensive use of educational tests in evaluating instruction. In 
the survey of Jacksonville, Florida (66), tests were given in reading, 
arithmetic, spelling, history, and English grammar and composition. 
Practically all conclusions on the efficiency of instruction.are based 
on comparison of the groups with grade or age standards. In the 
survey of Lynn, Massachusetts (68), educational tests were given 
in reading, arithmetic, and spelling in the elementary schools. The 
evaluation of instruction in the subjects measured is based exclusively 
on the test results ; however, the survey staff feels that it is impossible 
at the present stage of development to measure successfully in such 
subjects as history and geography or to gauge appreciation by means 
of tests. A large testing program was also included in the survey 
of Beaumont, Texas (65). Besides the Stanford achievement test, 
which was given to over 3,700 children, measurements were made 
in certain grades by means of composition scales and reading tests. 
Evaluation of instruction is made by comparison of children with 
grade and age standards. Much emphasis is placed on the use of 
the results of tests for diagnosis and remedial teaching. Five educa- 
tional tests were used in the survey of the schools at Fort Lee, 
N. J. (67). Comparisons of pupils with age and grade norms are 
made. The use of diagnostic tests to locate class and individual 
difficulties and the use of practice tests to remedy weaknesses in 
drill subjects are recommended. In the survey of the Cape 
Towns (30) large numbers of tests were given for the purpose of 
evaluating instruction in the elementary and secondary schools. All 
interpretations are based on the comparison of pupils with grade 
norms. Further testing is strongly recommended for the purpose 
of improving classification. The only printed survey conducted dur- 
ing the year in which no tests were used is the self-survey conducted 
by the local school officials in Hamtramck, Michigan. 


Uses of Tests in Improving Marks and Marking Systems. Sy- 
monds (63) argues for the basing of school marks in the school 
subjects exclusively on measurable achievement. He does not object 
to marks on studiousness or effort, nor to estimates of character traits ; 
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indeed, he recommends them. However, he feels that these should 
be reported separately, and not included in and confused with the 
marks in the school subjects. He thinks that standardized tests and 
teachers’ informal examinations should play a large part in deter- 
mining marks in measurable school achievement. Ruch and Stod- 
dard (58) recommend standardized educational tests as being valuable 
for purposes of supplementing teachers’ marks in determining 
promotion or nonpromotion. 

The advantages of the new type questions employed in mental 
and educational tests have impressed teachers, and as a result there 
has been a rapid spread in the use of new-type informal examinations 
for determining marks. Many articles deal with the use, the limita- 
tions, and methods for the improvement of these new type informal 
tests. Ruch and Stoddard (58), Symonds (63), and James (37) 
give directions for constructing them. Ruch and others (57) discuss 
in detail objective methods of examining in the social studies. The 
authors favor the use of improved informal examinations based on 
local curricula rather than the development of standardized tests 
for national use, because they fear that the latter may tend to per- 
petuate the traditional curriculum in such subjects. The results of 
several important experimental investigations are included. 

Tharp (69) compares the old with the new type examination in 
French. He concludes that for testing in grammar the new type 
examination is more economical in time and more reliable than the 
old type. Waples (77) advocates the multiple choice exercise as a 
device for directing the pupil in his study. Its chief use he thinks 
is in analyzing problems. Weinland (78) stresses the value of the 
true-false examination for purposes of discovering weaknesses in 
teaching as well as weaknesses in pupils’ knowledge. He feels, how- 
ever, that the content of the examination is of greater importance 
than the form in which the questions are framed. 

Wood (80) conducted an extensive experimental study to deter- 
mine the value of new type examinations in measuring achievement 
in foreign languages. His results, based on the analysis of examina- 
tions given to thousands of high school students, lead one to conclude 
that there is an important place for the new type exercises in 
teachers’ informal testing and in state examinations in languages. 
Valuable suggestions are given on the construction of examinations. 

Wood (81) studied the validity of different types of tests. She 
finds that a fifteen-minute new-type examination is more valid than 
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a fifty-minute essay-type test. The completion type stood highest 
in point of validity. The validity of the true-false test was raised 
from .75 to .85 by correcting for chance. Ruch and Stoddard (58) 
feel that the new-type examination can test a more extensive sampling 
of knowledge in a given time than the essay type. They make a 
distinction, however, between an extensive sampling of knowledge 
provided by the new-type test and an intensive sampling supplied 
by the traditional written examinations. They conclude that the 
new-type examination is superior to the traditional essay type in 
validity and reliability per unit of testing time, but they remind the 
reader that ordinarily the new objective examination provides little 
opportunity for training in organization and expression of thought. 
They find that the validity of the true-false test is greatest when the 
examinees have been instructed not to guess and when the R-W 
method of scoring is used. Lohr (42) concludes that the reliability 
of the recognition type of test is decreased when much guessing 
occurs. He reports that completion type of test has higher relia- 
bility than true-false or multiple choice. Fritz (22) notes that on 
the true-false test students guess “true” rather than “false” in 
the ratio 62:38. The problem of cheating on the new-type examina- 
tion has been studied from one angle by Bird (5). Giles (25) sug- 
gests a modification of the multiple choice test in order to expose 
the student to a larger per cent of right statements. He recommends 
that in the case of each question all responses except one be true— 
in a few cases all should be true. The task would be the identifica- 
tion of the incorrect responses. Foster and Ruch (19) discuss the 
correction for chance in multiple choice tests. They conclude that 
w 
the formula aoe ae over-penalizes for guessing. Walker (75) 
studies mathematically certain questions suggested by the true-false 
test. Probably the most practical contribution is a table indicating 
the frequency with which 2, or 3, or 4, or 5, etc., true statements (or 
false statements) would occur in sequence by the law of chance. 
Arnold (2) studies the discrepancies between the results obtained 
by the true-false and the simple recall examination, and concludes 
that where the group tested is homogeneous in training the true- 
false test is more reliable, otherwise the simple recall type is more 
reliable. Hammond (28) studies the reliability of an informal English 
test consisting of three ten-minute sub-tests of the true-false, mul- 
tiple choice, and completion types. She finds the reliability of the 
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whole test to be .92 which was distinctly higher than that found for 
the old type examination. 


Use of Tests in Diagnosis and Remedial Teaching. Gates (23) 
reports on the construction and use of diagnostic tests in primary 
reading. This work represents a significant contribution in the con- 
structing of diagnostic tests in light of the psychology of learning 
in the subject tested. Elsewhere (24) the same worker describes 
the method of constructing and validating his tests for the measure- 
ment and diagnosis of reading abilities in grades III to VIII. The 
tests are constructed on the theory that there are several types of 
reading activities and that the different types are dependent upon 
specific techniques, abilities, or skills which may be acquired in whole 
or in part by practice. Tests were selected to measure four aspects 
of reading: first, the ability to understand the general significance 
of a passage; second, the ability to use main ideas of a paragraph 
to solve related issue; third, the ability to understand precise direc- 
tions; and fourth, the ability to note suggested details. Age and 
grade norms are reported. Facts on reliability are also given. Two 
equivalent forms are available. The author concludes that in the 
hands of an examiner of moderate skill the results based on both 
forms of the test will be sufficiently accurate for use in individual 
diagnosis. Reavis and Breslich (55) have published a diagnostic 
test in the fundamentals of arithmetic for grades VII to IX. No 
statistical information is given concerning reliability or validity. 
Thompson and Orleans (50) have devised two Latin tests which they 
recommend for diagnostic use. Potter and Touton (54) describe 
a test with a reliability of .80, which they feel to be of value in 
diagnosing errors in written composition. Grade norms from VII 
to XII and age norms from 11 to 19 are given. They recommend 
the pretest—teach—retest—teach formula, as does O’Hearn (46) 
also. Fowlkes (20) reports on the use of the results of inventory 
tests for diagnosis. Certain (13) finds that instruction in English 
and spelling is more effective when it makes use of the results of 
diagnostic testing. A plan for the use of tests in English teaching 
is presented in detail. 

While some authors recommend the use of tests in diagnosis, 
others affirm that diagnosis on the basis of present-day tests is likely 
to be fallible. Herron (33) claims that there is too much diagnosis 
and prognosis in the schools on the basis of inadequate tests. 
Kelley (38) makes a distinction between group diagnosis and indi- 
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vidual diagnosis, but he says that a technique which is inaccurate in 
a study of individual cases can be discarded generally with little 
loss. He utters a warning against too great dependence on indi- 
vidual scores from the average test, and especially does he emphasize 
the need for radical improvement in intelligence and educational 
measures before a ratio between these two—such as the accomplish- 
ment quotient—will be sufficiently reliable for use in individual 
diagnosis. Symonds (63) states that in order to diagnose individual 
difficulties a test should contain several items of the same type so 
that real deficiencies may be distinguished from chance errors. 

In several of the articles already mentioned, and in one of the 
surveys, reference was made to the use of practice tests for remedial 
work in certain skills. In addition to these, mention should be made 
of a report by Fowlkes (20) on the use of test material for diagnosis 
and practice in connection with multiplication combinations in the 
third grade. He finds that the experimental group, which used the 
test material, spent 70 per cent less time on certain combinations 
and yet was 50 per cent more efficient in solving them than other 
groups in the same school system. He suggests the use of tests in 
directing practice work more intelligently. Though the values of 
practice tests were emphasized fairly frequently in the literature, no 
new tests were reported. 


Use of Tests in Prognosis and Guidance. Symonds (64) urges 
more research both for the improvement of reliability of present 
measures of aptitude and also for the development of prognostic 
tests in new fields. He emphasizes especially the need for better 
measures of aptitude for clerical and mechanical pursuits. Friedl 
(21) reports some preliminary results obtained with a test of foreign 
language prognosis. The test proved to be fairly successful in 
predicting which students would receive percentage grades above 80 
and which would receive failing grades at the end of their first six 
weeks in study of foreign languages. Kelley (38) concludes that, 
where equally reliable tests are available, achievement tests in a 
given subject are to be preferred to intelligence tests for prediction 
of success in that subject. He thinks that educational tests are, all 
things considered, preferable at all levels between the second grade 
of the primary school and the third year of high school for prognosis 
of school success. He concludes from his study of test reliability 
that achievement test results are more dependable for prognosis of 
further school achievement than for diagnosis. Ruch and Stod- 
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dard (58) find that educational tests have a higher predictive value 
of future school success than teachers’ marks or intelligence tests. 
On the basis of a study of entrance test results and academic work 
of students at Bryn Mawr College, Crane (18) concludes that the 
best predictive measure of scholastic success in college is a com- 
bination of intelligence test scores and achievement ratings. 
Carter (11) finds that scores obtained on the Psychological Examina- 
tion of the National Council on Education, 1926 edition, correlate 
45 with the first semester grades in English, while the scores on 
an informal test in English requiring only about one-third the time 
correlate .38 with these grades. Haddock (27) recommends educa- 
tional quotients based on the Stanford achievement test as important 
measures for predicting success in high school. She finds that 70 
per cent of those failing in the first year of high school were below 
the median E.Q. at the end of the elementary school period. 

Two articles present data on old prognostic tests. Blakey (6) 
finds a correlation of .67 between scores on the Wilkins prognostic 
test in modern languages and teachers’ marks. Bear (3) reports a 
correlation of .25 between results obtained on the Iowa physics 
aptitude placement test and one year’s grades in physics. He finds 
that the test is better for estimating general academic standing than 
for predicting success in physics. The correlation between the test 
scores and one year’s average academic grades was found to be .64. 

Several new tests have been devised, and two interesting proposals 
have been made concerning the extension of prognostic testing to 
untried fields. Zyve (82) gives a report on the construction and use 
of a test of aptitude for scientific study. The examination presumes 
no information beyond that ordinarily acquired by the end of the 
elementary school course. Correlations were obtained between the 
test results and scores assigned by competent judges to research 
students in physics, chemistry, and electrical engineering. In physics 
the correlation was .95 as determined from a study of 10 cases. In 
chemistry the correlation based on 21 cases was .77. In electrical 
engineering the correlation based on 19 cases was .89. The cor- 
relation between scores on this test and intelligence measures was 
low. From these facts the author concludes that the test measures 
to a large degree scientific aptitude rather than scientific training or 
general intelligence. Limp (40) describes a new battery of tests 
for use in predicting success in certain commercial subjects. He 
reports a correlation of .63 between test results and subsequent class 











EDUCATIONAL TESTS 417 


grades in typewriting, and a correlation of .61 between test scores 
and grades in shorthand. Orleans and Solomon (49) report a Latin 
prognosis test. Stoy (61) has attempted to find some test for apti- 
tude in mechanical drawing. After discarding many measures on 
the basis of experimental evidence, he finally decides upon a battery 
of six tests which he recommends. Cox (17) suggests that a com- 
mittee of judges of art devise tests for ability in industrial arts, in 
the crafts and minor arts, in architecture, and in painting. Knight (39) 
recommends the construction of tests to measure teaching aptitude 
of prospective high school instructors. 


Bibliographies of Tests. Attending the rapid increase in the 
number of tests, there is a growing demand for annotated bibliog- 
raphies and selected lists. Several important lists of such a nature 
have appeared recently. Kelley (38) presents a list of the most 
important tests, and a ranking on the basis of merit is assigned to 
each test by from five to seven recognized authorities in the field 
of measurement. In a separate lst the author includes extremely 
valuable facts concerning reliability of each test; grades for which 
it is suitable ; time required to administer and score; cost; publisher ; 
etc. Ruch and Stoddard (58) give classified lists of tests which they 
consider especially suitable for use in junior and senior high schools. 
Certain selected tests are described in detail. Symonds (63) also 
discusses selected tests of the high school level. Odell (47, 48) has 
issued a second revision of his lists of elementary and high school 
tests. He includes only those which according to his judgment 
possess enough merit to warrant their use. A brief comment com- 
prising a few sentences is made on each test listed. 
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PERSONALITY AND CHARACTER TESTS! 


BY MARK A. MAY, HUGH HARTSHORNE AND RUTH E. WELTY 
Teachers College, Columbia University 


In addition to the usual summary of articles dealing with per- 
sonality and character tests, we shall include here titles on ratings, 
general discussions and experiments bearing directly on this subject. 
The rapidly developing interest in the physiological and morphological 
aspects of personality in Europe has resulted in a sufficient number 
of articles and books to justify a special heading for them. Articles 
reporting the use of old techniques, because of interest in the results 
only, are listed in Section G. 

A. Summaries. No less than eighteen summaries have appeared 
during the calendar year of 1927. In addition to the 1926 summary 
of 196 titles by May, Hartshorne and Welty (80) the following 
authors have contributed lists which include references to the prob- 
lem of measurement in this field: Faris (34) summarizes the litera- 
ture from the general field of personality with reference to sociology. 
His summary contains 117 titles, some of which concern methods of 
research. Froemming (40) offers a short bibliography of character 
tests of 74 titles. Furfey (42) offers an annotated bibliography 
of 54 titles. Pangburn (88) has summarized the outstanding con- 
tributions of psychology to personality including the measuring move- 
ment. Roback’s (96) comprehensive bibliography of 3,341 titles con- 
tains references to measurement. It is complementary to the Manson 
bibliography published in 1926, overlapping it in only about 15 
per cent of the citations. Shuttleworth’s (107) bibliography contains 
116 titles covering the period from January, 1924, to October, 1927. 
Updegraph (128) refers to 57 scales and devices. G. B. Watson has 
two summary articles. The first (132) contains 167 titles each briefly 
annotated. The other (135) overlaps the first to a considerable 
degree but has supplementary references. In his recent text (133) 
he gives a carefully annotated list of actual tests. Kimball Young 
(145, 146) has two summary articles, both bearing on the field of 
social psychology. One contains 279 titles and the other 189. The 
classifications are of special interest. 

1 This bibliography has been prepared in connection with an Inquiry in 
Character Education made possible by a grant to Teachers College from the 
Institute of Social and Religious Research. 


422 








ay ae a a Tie, ee 


Ss 





PERSONALITY AND CHARACTER TESTS 423 


Summaries of more specialized nature have been made covering 
several aspects of personality. G. W. Allport (4) summarizes the 
work of the past dozen years on traits. He finds much confusion 
of terminology, and proposes a definition of “trait.” The bibliog- 
raphy has 46 titles. Farr (35) has a bibliography of 31 titles on the 
relation of morphological features to character and personality traits. 
Landis (65) summarizes the various methods of detecting deceit 
including association methods, respiratory and cardiac measurements. 
He concludes that there is good experimental evidence that deception 
can be detected by these methods, but doubts the practical significance 
of the results. Starbuck (114,115) has summarized the work that 
is being done at the University of Iowa. A list of 30 titles is 
appended to Witty and Lehman’s study of “ drive ” (142). 

B. Batteries Including Various Assemblages of Tests Intended 
to Measure More Than a Single Trait. Baxter (7) reports fourteen 
tests of speed and ten of strength in an experiment in temperament 
types referred to later. 

Brotemarkle (16) reports the use of the Downey, Pressey X-O 
and Brotemarkle Comparison Test to secure an emotional rating of 
college students. A social rating was secured from a personnel 
questionnaire. 

Cushing and Ruch (23) used eight paper and pencil tests in an 
experiment to determine whether such modes of testing could dis- 
tinguish the potentially delinquent children in public schools. Five 
of the eight tests proved satisfactory. The biserial r’s between 
delinquent and nondelinquent groups, however, were none of them 
large enough for reliable prediction. 

Guthrie (47) gave 365 students three tests of introversion-extro- 
version, viz., the Colgate Inventory, a Gossip test, and the Jung Asso- 
ciation test. The students also rated their instructors to give a 
“ judgment-of-others ” ability score. 

C. Tests and Techniques Intended Primarily to Measure Objec- 
tively (and Mainly in Terms of Conduct) Certain Personality Traits 
and Types of Behavior. 1. Deception. The technique of the “lie 
detector ” is described by Larson (66) who reports further work with 
a modified Erlanger sphygmomanometer, combined with a device 
for recording inspiration-expiration curves. He refers also to an 
effort which is being made to study certain secretory and electrical 
changes. 


Objective measures of deception are reported in four articles. 
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Clark (19) apphed variations of the Cady and Voelker “ peeping” 
tests to 500 school children. The results show that from 25 per cent 
to 30 per cent of the children cheated. Cushing and Ruch (23) com- 
pared a group of delinquent girls with a control group in order to 
validate several character tests, including the “ false book titles ” test 
and the “ overstatement ” test. They found the difference in the case 
of the “false book titles” equal to 5.0 times its P.E. but with the 
“ overstatement” test the difference was only 2.2 times its PE, 
Woodrow and Bemmels (143) propose a modification of the Voelker 
“overstatement” test as a measure of general character. They 
secured correlations of .36 to .62 between honesty scores on the 
“ overstatement ”’ test and character ratings of 31 nursery school and 
kindergarten children. The intercorrelations of the ratings made by 
the five teachers ranged from .64 to .83. The authors conclude that 
the “ overstatement” test is, as far as it goes, a good test of general 
character for children of nursery school and kindergarten ages, 
Yepsen (144) used the duplicating technique on 53 teachers, employ- 
ing the Ohio literacy test as test material. Duplicates of the papers 
were made and later returned to the students to be scored. He found 
that about 25 per cent of them cheated. The amount they cheated 
ranged all the way from eight times in eight opportunities to none in 
nine opportunities. 

2. Originality. McClatchy (81) correlated certain objective tests 
of originality such as the chain puzzle and analogies tests with ratings. 
The correlations were all very low. 

3. Sociability. Using three tests and a questionnaire, Burke (17) 
attempted to measure sociability in 91 college students. A test requir 
ing the subject to recognize photographs once seen when presented 
a second time along with new ones proved to be the best single 
measure of this ability. 

4. Social Perception. Dashiell (24) had children pick out one of 
four photographs that would match an incident in a story that was 
being told. This avoided the necessity of naming the emotions 
represented. 

D. Tests and Testing Techniques Intended to Measure Primarily 
the Affective Aspects of Personality. 1. Instincts and Emotions. 
a. Laboratory Techniques. Two studies of psychogalvanic reflexes 
are reported. Fleming (37) found that ratings for “ magnetic per- 
sonality ” correlated .44 with electric resistance measured with the 
galvanometer, and ratings for “nervous temperament ” correlated 
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.35 with the same. The multiple R between electric resistance and 
ratings for the two traits combined is .65. The author suggests that 
this study is worth extension. Wechsler and Jones (137) confirmed 
the results secured by Whately Smith in his “ Measurement of Emo- 
tion” in showing that certain stimulus words have greater effective- 
ness than others in eliciting galvanic responses. The important item 
contributed by their study is that the effectiveness of a word depends 
largely on its position in a series. 

b. Paper and Pencil Tests. Allen (2) used a modification of the 
word association technique formerly employed by Moore for measur- 
ing the relative strength of instincts. Ten types of instinct were 
chosen and ten stimulus words were assigned to each. Normal word 
association time was first established. Then the average time for each 
group of ten words was compared with this fact. These records were 
supplemented with a questionnaire and ratings from two friends of 
each subject. The Pressey X-O test was also applied. The various 
types of measurement had reliabilities averaging around .40. The 
results show that the relative strength of certain instincts and emo- 
tions may be determined by this method and also that there seems to 
be a common factor of general emotionality underlying all the traits 
studied. 

British norms for the Pressey X-O tests have been secured by 
Collins (20) by giving the tests to 1,500 children, ages eleven to 
fifteen, in England and Scotland. Important sex differences are 
noted. The boys have lower affectivity scores than the girls. The 
British norms are different from the American. The author also 
tested 100 delinquent boys, ages eleven to fourteen. Marked differ- 
ences were revealed between these and nondelinquent boys, indicating 
that these tests have diagnostic value, although marked changes occur 
in the records even after a short interval. The reliability of the 
Pressey X-O test has been studied by McGeogh and Whitely (82). 
Using as subjects college sophomores, the affectivity scores have 
reliabilities of .51 to .86 when computed by separate tests. The idio- 
syncrasy self r’s by tests run from .28 to .77. On the whole these 
reliabilities tend to decrease with longer periods between testing. 
But within a forty-eight-hour interval the Pressey X-O tests have 
very satisfactory reliabilities for the age concerned. 

Il. Mood and Temperament. The Downey will-temperament test 
continues to attract attention. Downey and Uhrbrock (31) report 
further work on the reliability of the various sections of the test. 
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Using as subjects 149 college women, they get reliability coefficients 
of .63, .51, .31, and .46, respectively, for the four main sections of the 
total battery. Using 42 junior high school boys as subjects, they get 
64, .09, .36, and .37 as reliability coefficients, and with 37 junior high 
school girls they get .64, .57, .50, and .26. They conclude that satis- 
factory reliabilities may be obtained by rewording the directions for 
certain of the tests and lengthening them and by improving the tech- 
nique of administration so as to secure a certain mental set. In 
another paper Downey (29) discusses the validity of the group will- 
temperament test and reports certain partial correlations with school 
grades of junior high school pupils when intelligence is held constant. 
These partials vary from .23 to 46. She also reports multiple cor- 
relations between school grade with a combination of intelligence 
score and score on will-temperament test VI-2 (writing the phrase 
United States of America as rapidly as possible) of 69. Ther 
between grade and intelligence alone is only .56. She discusses the 
possibilities of validation by securing differential scores between con- 
trasting groups, as, for example, delinquents and nondelinquents. 

Uhrbrock and Downey (127) have devised a nonverbal edition 
of the will-temperament test for use below the fifth grade and with 
illiterate adults. The reliabilities of the nonverbal tests (they are 
twelve in number) range from .08 to .82 with a group of junior high 
school boys, and from .21 to .85 with a group of junior high school 
girls. The correlations between the twelve nonverbal tests and the 
corresponding twelve verbal tests range from .02 to .53, and 
average .24. 

Garth and Barnard (43) compared the will-temperament scores 
of 170 full blood Indians with scores of 101 white students in a 
Denver high school (age, all but seven, over seventeen). The results 
show no significant difference in total success but rather marked 
difference in certain of the separate tests. The greatest differences 
are in motor inhibition, where 72 per cent of the Indians exceed the 
white median, and in speed of decision, where 18 per cent of the 
Indians exceed the white median. Kornhauser (64) found that the 
average first year marks of 111 freshmen students in the School of 
Commerce and Administration of the University of Chicago corre- 
lated around zero with scores on the will-temperament tests. The 
r’s with ratings on traits of industry, accuracy and initiative, made by 
instructors and fellow students, were also very low. Roe and 
Brown (98) gave the will-temperament tests to students of dentistry, 
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50 seniors and 30 freshmen, and correlated the results with the pre- 
dicted success of the students made by faculty members. The corre- 
lations ran uniformly low (some even slightly negative), the highest 
being +.44, the average around zero. 

An experimental investigation of the traditional four-fold classifi- 
cation of temperaments into quick-strong, quick-weak, slow-strong, 
and slow-weak, is reported by Baxter (7). Strength and speed of 
responses in a wide variety of situations were measured separately and 
objectively. There were fourteen measures of speed and ten of 
strength. In addition, speed and strength of certain physiological 
processes were measured objectively. Also ratings on strength and 
speed were secured. The objective measures all have satisfactory 
reliabilities. The results of this very elaborate and careful study 
show that the subjects do not fall into any sharply defined groups 
with respect to the traditional four temperaments. High degrees of 
specificity prevail throughout. The inter r’s of the speed tests are 
all around zero, and the same is true of the strength tests. 

The effect of the periodic physiological changes in female tem- 
perament was studied by Conklin, Byrom and Knips (22), who report 
a tendency toward introversion in proportion to the severity of the 
period. 

III. Attitudes, Interests, Preferences, Prejudices, etc. a. Specific 
Attitudes. 1. Occupational Attitudes. Anderson (5) had 609 stu- 
dents in the University of North Carolina rank twenty-four occupa- 
tions “in order of social standing.” The different occupational 
groups among the students showed remarkable agreement of opinion. 
Davis (26), adapting a list used in America, had Russian children 
and Russian workers rank various occupations according to their 
position in the social scale, with the occupation “most looked up 
to” at the top. Although Russian groups varied from one another, 
they agreed in reversing the typical American opinion with regard 
to the status of banker, business man, and minister. , 

2. Conservative-Radical Aititudes. Using the Moore question- 
naire for determining radical or conservative temperaments, Wash- 
burn et al. (130) found Vassar women about as radical as Moore 
found the Yale and Dartmouth men. Moore’s results (J. Abnorm. 
and Soc. Psych., 1925, 234-244) were not confirmed. The more 
radical were compared with the more conservative in respect to intel- 
ligence, mirror drawing, reaction time, card sorting, and free associa- 
tion. In none of these comparisons were the differences significant. 
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Reed (92) reports the results of a questionnaire similar to Moore’s 
but covering a wider range. He finds that college students are not 
consistently radical or conservative; but are rather mixtures, being 
radical on some topics and conservative on others, but with a tendency 
toward liberalism. 

3. Race Attitudes. Bogardus (10) presents an analysis of the 
type of behavior stimuli that tend to reduce social distance. Fred- 
erick (39) found, by means of a true-false test of twenty-five ques- 
tions, that 1,116 high school students were grossly ignorant of inter- 
national affairs, possessed a high degree of race prejudice, and an 
unintelligent patriotism. Orata (87) reports a statistical study of the 
factors that tend to reduce race prejudice among college students, 
The facts were gathered by means of an information test covering 
Oriental affairs and a questionnaire. The most important factors 
reducing prejudice are age, culture courses, and culture societies, but 
even here the correlations are around .33 and .34. 

4. Religious Attitudes. Bain (6) submitted seventeen questions 
on religious topics to 200 college students. He found much more 
liberal attitudes than Leuba found with similar questions in 1916. 
Sturges (121) inquired concerning doctrines taught in the Sunday 
school attended by the subject, doctrines believed then, and doctrines 
held now. His results show that on the whole 27 out of 40 doctrines 
find fewer advocates among college students than among Sunday- 
school students. 

5. Social Attitudes. Cavan and Cavan (18) report a statistical 
study of the attitudes of young business women toward home and 
married life. Harper (49) contributes a scale for testing the social 
beliefs and attitudes of adults, with a (retest) reliability of .90, and 
reports its application to 2,900 educators in all states of the Union. 
G. B. Watson (136) describes his test of fairmindedness and its 
results. Social attitudes as revealed by questionnaires concerning 
social activities have been studied by Stanforth (112), Terry (122, 
123), and Trow (125). 


b. Interests and Preferences. Two studies have appeared relating 
occupational interests to abilities. Fryer (41) endeavored to find 
out whether or not young people chose occupations lying within their 
mental abilities. He found that choice of vocation is uncorrelated 
with amount of intellect required to succeed in that vocation. Korn- 
hauser (63) found that score on a vocational interest blank based on 
Freyd’s correlates very low with scholarship and intelligence among 
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students in the School of Commerce and Administration of the Uni- 
versity of Chicago. Strong (116, 117,118,119) has contributed four 
papers on the possibilities of vocational differentiation by the use of 
vocational interest tests. In one article (119) the tests are described 
in full; in the other three the results on executives, certified public 
accountants, and engineers are given. 

c. Measures of Motivation. Hurlock (59) reports that group 
rivalry increases both the quantity and quality of school work, espe- 
cially among children of inferior ability. Ross (100) finds that 
knowledge of progress in a simple muscular skill increases the rate 
of improvement from 2 per cent to 12 per cent. 

E. Tests and Techniques Intended to Measure Primarily Social- 
Ethical Ideas and Judgment. Six articles by Hartshorne, May 
et al. (52) on testing the knowledge of right and wrong are now 
available in a single monograph. They have described here in some 
detail a battery of moral knowledge tests used by the Character 
Education Inquiry. Sturges (120) has contributed a study on the 
use of opinion tests in determining changes in attitude. 

Rosner (99) reports a list of acts which are written on cards and 
then arranged by the subjects in order of seriousness. Differences 
among the subjects are attributed to environment and mental maturity. 

Dearborn (27) submitted a questionnaire concerning honesty to 
259 third and fourth grade children. The purpose of this study was 
to secure children’s ideas of what constitutes honesty. Wide indi- 
vidual differences exist among children’s ideas as to what is honest 
and what is not in a great many situations. Blomfield (9) applied a 
multiple choice and true-false test of the comprehension type to 167 
Sunday-school children. No wide difference between juniors and 
seniors in the same Sunday school are apparent, except with regard 
to a few social questions. 

F. Ratings and Self-Rating. Guthrie (48) found that the relia- 
bility of students’ ranking of instructors with respect to quality of 
teaching was .89. Kornhauser (61) found that the correlations 
between ratings by the same instructor at different times are around 
Q and the r’s between one instructor’s ratings and another’s on the 
same students are around 40. In another paper Kornhauser (62) 
shows that the same average rating may mean something quite differ- 
ent in different traits. He finds further that the inter r’s of trait 
ratings are usually so high that a rating on a few traits will give one 
about as good a picture of the personality of the subject as ratings 
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onmany. A. H. Miller (83) reports on the rating schemes employed 
in thirty-three high schools in New York City. 

By securing self-ratings and ratings of others on the Heymans 
and Wiersma list of traits from 80 subjects in groups of ten each, 
Adams (1) found that persons who are the best judges of themselves 
are open-minded, sympathetic, and lack self-consciousness. The 
better raters of others are egotistic, cold blooded, and anti-social. 

F. H. Allport (3) contributes a discussion on the problem of self- 
rating, discussing attitudes that act as obstructions to true self-ratings, 
Howells (56) suggests a self-rating scheme for determining radical 
or conservative attitudes in religious beliefs. The proposed device 
has a split-form reliability of .85. Differences between radicals and 
conservatives in intelligence, suggestibility, etc., are reported. Hur- 
lock (58) secured six ratings on the traits listed in the Downey test 
No. 7 on 425 public school children, grades seven and eight. The 
children simply checked the words indicating the traits they thought 
they possessed. Only 6 per cent of the total number of checks were 
on undesirable traits. She concludes that self-rating schemes for 
children are liable to give results of uncertain value. 

Heidbreder (54) reports a combined rating and self-rating scheme 
for studying inferiority complexes in the case of 120 men and 148 
women. The scale consisted of 137 traits, which, if taken in one 
direction, are symptomatic of inferiority complexes. The scale has 
a reliability of .73 and shows no marked groups, but yields a normal 
distribution. But by taking the upper and lower quartile she was 
able to select the traits differentiating the extremes. Hoopin- 
garner (55) proposes an elaborate self-rating and self-analysis scheme 
for determining whether or not one has the personality traits that 
contribute to business success. Shuttleworth (106) used a self-rating 
scheme for studying the effects of early home religious training on 
religious attitudes and practices in college. He reports a split-form 
reliability of 92. The correlation between the self-rated early relig- 
ious training items and present beliefs average .208,; between early 
religious training and present religious practices the average r is .436, 
between early religious training and cheating, zero. Trow and 
Pu (126) found that 21 Chinese students tend to underrate them- 
selves in six traits to the average extent of about 7.4 points on a scale 
of 100, as compared with the ratings given them by the others in the 
group. Compared with American students who tend to overratt 
themselves, this study shows a marked racial characteristic. 
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G. Experiments Involving Quantitative Studies. 1. The Relation 
of Bodily Structure to Personality Traits. The work of Kretschmer 
seems to have stimulated several investigators to pursue this type of 
study further. Farr (35) made physical measurements on 70 sub- 
jects and compared the results with intelligence tests and behavior 
adjustments. The results show a rather definite association of intro- 
vert and schizoid personalities with the slender and elongated body 
types. A bibliography of 31 titles is appended. E. Miller (84) dis- 
cusses psychological types and their relation to morphology in a small 
volume. Mohr and Gundlach (86) sought Kretschmer types among 
convicts in the Illinois State Penitentiary. They found the distribu- 
tion of types about the same as Kretschmer found in the population 
at large. They found important differences hetween the asthenic 
type and the pyknic type in alpha scores. The r between alpha score 
and index of build was found to be —.34. The Kretschmer types 
were also compared in respect to tapping rate, speed of writing, reac- 
tion time, writing with distraction, Franzen dotting test, Young’s 
light series, a cancellation test, color fusion, substitution test, writing 
backwards, information test, one test not described, and a variety of 
social data. The authors conclude from their results that we are 
scarcely justified in retaining the concept of “types.” Wertheimer 
and Hesketh (139) report a study that differs somewhat from Kret- 
schmer’s but still aims to find body types corresponding to psychiatric 
clinical types. The number of cases studied was too small to arrive 
at definite conclusions. 

Sheldon has contributed three papers on the relation between 
certain morphological indices and personality traits. In one 
study (102) he reports an r of .136 between M.I. and score on the 
American Council on Education’s Psychological Examination. In a 
second study (103) he reports correlations between various morpho- 
logical measurements, morphological index being only one, and ratings 
on five personality traits. The ratings have a reliability of 88. The 
r’s between the physical measurements and the rated traits are all 
around zero. The highest is —.217 between M.I. and sociability. 
In a third article (101) he reports the results of an attempt to relate 
facial measurements to rated personality traits. The r’s between 
various head measurements and five traits run from —.154 to +.304, 
with an average around zero. The same holds when ratios between 
head measures are correlated with the ratings. 


In addition to the search for personality types corresponding to 
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morphological types there is also going on in Europe a movement 
toward bio types or biological types. Pende et al.(89) report on 100 
cases studied at the Biotypological Orthogenic Institute at Genoa, 
Biotypology includes all aspects of the personality and seeks funda- 
mentally general types. Jaensch (60) has contributed an entire 
volume devoted to the study of biotypes among normal children. He 
reports the characteristics of two major biotypes which he designates 
as “ T” and “ B,” with much supporting experimental data. 

Three efforts to find personality differentiae in other types of 
analysis have been reported. Raphael et al.(91) tried to differentiate 
the two major psychiatric clinical types of personality by blood group, 
but without success. Rich (93) reports negative correlations between 
traits of leadership and aggressiveness and amount of acid in the 
urine, and also a negative r between leadership and creatinine excre- 
tion. Travis (124) reports that psychoneurotic and schizophrenic 
clinical types may be differentiated by measuring the auditory and 
visual thresholds. 

Il. Behaviors and Traits. 1. Collecting. Lehman and Witty (77), 
using the Lehman play quiz, report that their play groups show that 
only about 10 per cent of 5,000 Kansas public school children engaged 
in the sport of collecting and hoarding. They compare this with the 
results of a study made thirty years ago by Burke in which it is 
reported that 90 per cent of children engage in such activities. 

2. Deception. Doring (28) analyzes sixty cases of children’s lies, 
showing that each statement of the child exhibits such elements as 
transformation, exaggeration, invention, phantasy, suggestibility, 
anxiety. Fenton (36) reports on three types of classroom situations 
in which opportunity for cheating on examinations was given. The 
fact of cheating was determined by having three observers (all stu- 
dents) placed in the room so that each could observe eleven other 
students. Sixty-three per cent of the group were reported as having 
cheated in at least one situation. High grades, higher intelligence, 
and experience in high school honor systems were all associated with 
greater honesty. G. F. Miller (85) reports a technique for detecting 
classroom dishonesy which consists mainly in “ planting” errors in 
the marking given to students’ papers by the instructor, and then 
allowing the students to check the markings on their own papers. The 
point is that the dishonest student will say nothing about errors that 
are in his favor, but will call attention to errors that lower his record. 
Of thirteen students in one group who say that their papers were too 
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high only one made the correction; of twelve in another group, seven 
made the correction. The obvious difficulty with this technique is 
that individual cheaters cannot be detected. Witty and Lehman (141) 
have sharply criticised the interpretations placed on the meaning of 
the results of the “ overstatement ” tests and the “ false book titles ” 
test. They tested a group of fifty gifted children and a control group 
with these two tests. They interpret the results specifically in terms 
of the test situation rather than in terms of general character traits. 

3. Play Interests. K. M. B. Bridges (13) studied the persist- 
ence of play interests in three-year-olds. Lehman, Witty, and 
others (67-76) have made elaborate studies of the association between 
play interests as revealed in the Lehman play quiz and other factors, 
such as character traits, school progress, Sunday-school attendance, 
growth, talent. 

4. Social Perception and Recognition. By cutting photographs 
into two parts, horizontally through the bridge of the nose, so the 
eyes are in one part and the mouth in the other, Dunlap (32) was able 
to determine the relative importance of eyes and mouth in judgment 
of emotions. He found that the expression of the mouth is the pre- 
dominant factor in the act of judging. G. S. Gates (44) compared 
the auditory and visual elements as factors in the recognition of emo- 
tional states. Tentative norms are reported. Sherman (104) found 
that 119 graduate students in psychology and 50 medical students 
were much more “ successful ” in naming an emotional response in an 
infant when the stimulus was known than when the stimulus was not 
known. This experiment shows how difficult it is to name an emo- 
tion when the only datum at hand is the response. In another 
article (105) 22 graduate students are compared with respect to their 
ability to recognize the emotions supposed to be expressed by a 
trained vocalist and those supposed to be exhibited by the cries of 
infants. 

Further work on the Moss social intelligence test is reported by 
Hunt (57). 

5. Stealing. Riddle has two articles dealing with stealing. The 
first (94) deals with the relation of stealing to sex, intelligence, and 
chronological age. The subjects were 435 psychiatric clinic cases. 
The average 1.0. of 190 who were known to steal was 78 + 1.02 and 
the average of 68 cases known not to steal was 70 + 1.98, and of 177 
about whom there was no stealing record, 66+ 1.18. The difference 
in M.A. between those who steal and those who do not is four times 
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its P.E., those who steal averaging 10—4 and those who do not 8-11. 
The second article (95) analyzes and classifies the different kinds of 
theft. Stealing from home is the largest item. Each kind of theft 
is related to the C.A. and I.Q. of the thief, the more aggressive types 
of stealing being associated with greater age. 

III. Moral Concepts and Ideals. Brotemarkle’s (16) emotional 
rating battery showed correlations with college grade, mental com- 
petency, general intelligence, social rating, of .02, .04, .004, and .0003, 
respectively. Slavens and Brogan (108) secured rankings of the 
Brogan list of 15 bad practices from 400 high school students. The 
inter r’s of the rankings by various groups all correlate over .90 (one 
exception) and some as high as 97. G. B. Watson (134) reports 
an attempt to measure the value of summer camps in terms of 
responses on a series of paper and pencil tests. Specific gains and 
losses in scores in different camps are given. Williams (140) had 449 
junior high school pupils name twenty-five leaders, and analyzed the 
results so as to compare the categories reported and the choices of 
the two sexes. 

IV. Miscellaneous. The Colgate mental hygiene test has been 
used in studying extroversion-introversion or neurotic tendencies, or 
both, by Davenport, Downey, Elwood, Guthrie, and Heidbreder. 
Davenport (25) found that inspectors are more introverted than fore- 
men. Downey (30) reports tentative findings suggesting that dextral 
asymmetry may be correlated with introversion. Elwood (33) finds 
that girls who take up nursing are decided extroverts. “ The average 
nurse of the group tested was more extrovert than 94 per cent of all 
women entering college.” Nurses were also found to be more stable 
than the college girls. Guthrie (47) found that the Colgate test form 
C-2 (the personal inventory) has a reliability of .60, using 365 college 
students as subjects. He also finds that it correlates .01 with intel- 
ligence and .11 with scholarship. It also correlates low with other 
evidences of introversion-extroversion. The implication is that these 
contrasting types are not as pronounced as is commonly believed or 
are more specialized. 

Heidbreder (53), using a_ scale similar to Laird’s, finds no sex 
differences in respect to average scores, yet certain sex differences do 
appear on the separate test items. Conklin (21), on the contrary, 
reports extreme sex differences on his scale, which distributes his 
population normally. 

J. W. Bridges (11) reports interesting sex differences on the 
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Woodworth test for emotional instability among college men and 
women. The men show fewer symptoms of instability than women, 
but they exhibit greater variability in symptoms. He also finds that 
men students in arts courses show more symptoms of instability than 
men in medical courses, possibly because of difference in age. The 
same is true of women. The Woodworth score correlates zero with 
alpha and with college marks. The Kent-Rosanoff test and the 
Pressey X-O were also given to 27 arts students. The inter r’s were 
all insignificant. 

Bridges (12) also gave both the Woodworth and Woodworth- 
Mathews questionnaires to 33 delinquent girls and found that they 
are more emotionally unstable than ordinary girls, especially at the 
younger ages, and further that delinquent girls are more like delin- 
quent boys in respect to emotional instability than ordinary girls are 
like ordinary boys. The author attributes the abnormal symptoms of 
delinquents chiefly to broken home life. 

Briggs (14) requested a large group of graduate students to recall 
whether praise or censure stimulated them to greater effort in their 
high school work. The results confirmed the previous work of Laird 
showing that praise is more effective than censure or sarcasm. 
Brill (15) reports an analysis of motives for conduct disorders in 
boys. Foster (38) lists the personality traits of the jealous child as 
compared with the nonjealous child and gives evidence to show that 
jealousy is the product of certain unfavorable home conditions. 
Goodenough and Leahy (46) discuss the effects of being the oldest, 
middle, youngest, and only child on a variety of rated traits. Wide 
differences appear between the oldest and youngest, and the only 
child, in general, seems to exhibit fewer evidences of abnormality than 
the others. Mackaye (78) has made an analysis of the dates, causes, 
and permanency of vocational ambitions and their fixations in 400 
high school students. South (109,110) compared the relative effi- 
ciency of committees of three members and of six members in judg- 
ing the Feleky photographs. When working in groups of three, speed 
and accuracy both were greater than when working in groups of six. 
A committee all of one sex is more efficient than a committee of both 
sexes. Spearman’s notable work (111), The Abilities of Man, con- 
tains a mathematical test of the presence of a common factor in a 
series of measures which is of the greatest importance in the study 
of character traits and their interrelations. Wells (138) reports an 





436 M. A. MAY, H. HARTSHORNE AND R. E. WELTY 


investigation into the physiological processes occurring during an act 
of voluntary choice. 

H. Observation and Record Keeping. Although not strictly 
measurement, there is a strong tendency to develop techniques for 
quantifying observations. Only two studies can be reported at this 
time, but this class of title will undoubtedly increase in size and 
importance. Blatz and Bott (8) report the results of an extended 
period of observation by teachers on the misdemeanors of 1,400 
school children. The incidence of different types of social failure is 
given. Gesell and Lord (45) give the results of the detailed observa- 
tion of two groups of nursery school children, eleven from well-to-do 
families and eleven from poor families. Comparisons are drawn 
between the groups with respect to such facts as spontaneity, play 
initiative, poise, self-care, and it is asserted that certain basic psycho- 
logical factors permanently differentiating such groups are already in 
operation at this age. 

I. Discussion Articles. A number of articles not containing 
reports of experimental work but a discussion of methods and results 
have appeared. Some of those more directly related to tests and 
measurement are listed. 

Hartshorne and May (50,51, 79) discuss various aspects of the 
work of the Character Education Inquiry. Podach (90) reviews 
many of the leading European studies on the relation of body form 
and body chemistry to character and temperament. Roback’s (97) 
Psychology of Character is the outstanding contribution of the year 
1927, reviewing a wide range of literary and psychological material. 
Starbuck (113) discusses various methods that have been employed 
in the scientific study of character. Valentine (129) devotes a chapter 
to measurements of personality. G. B. Watson (131) discusses char- 
acter tests in general, summarizing some of the more outstanding 
facts. Witty and Lehman (142) discuss “drive” as an important 
factor in all testing, giving a bibliography of thirty titles. 
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NOTES AND NEWS 

On the occasion of the Linguistic Congress at the Hague iq 
April, an International Society of Experimental Phonetics wag 
founded. Professor E. W. Scripture of Vienna was elected president 
and Professor L. Zwaardemaker of Utrect was elected an honorary 
member. The object of the society is the promotion of scientific 
research in experimental phonetics. 

Proressor HerBert Wooprow, at present head of the depart 
ment of psychology at the University of Oklahoma, has beep 
appointed professor and head of the department of psychology of the 
University of Illinois to succeed Professor Madison Bentley. 

Tue BULLETIN announces the appointment of Professor John E. 
Anderson, Director of the Institute for Child Welfare of the Univer- 


sity of Minnesota as codperating editor of the BULLETIN in charge 


of the field of child development to succeed the late Professor Bird 
T. Baldwin. 

Dr. Henry E. Starr, assistant professor of psychology at the 
University of Pennsylvania, has been appointed professor of 
psychology at Rutgers College. 

Dr. Harry Hetrson of the University of Kansas has been 
appointed associate professor in experimental psychology and 
director of the laboratory of psychology at Bryn Mawr College. 

Dr. J. P. Guitrorp of the University of Kansas has been 
appointed associate professor of psychology and director of the 
psychological laboratory of the University of Nebraska. 

At the recent meeting of the American Academy of Sciences the 
following were among those elected to foreign honorary membership: 
Professor Wolfgang Kohler of the University of Berlin and Pro- 
fessor Karl Pearson of the University of London. 
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